Skip to content

Upgrade to llama stack 0.6.0#149

Open
mkristian wants to merge 4 commits intocontainers:mainfrom
mkristian:upgrade-to-llama-stack-0.5.1
Open

Upgrade to llama stack 0.6.0#149
mkristian wants to merge 4 commits intocontainers:mainfrom
mkristian:upgrade-to-llama-stack-0.5.1

Conversation

@mkristian
Copy link
Copy Markdown

@mkristian mkristian commented Mar 6, 2026

basically a copy and paste of https://github.com/llamastack/llama-stack/tree/main/src/llama_stack/providers/remote/inference/llama_cpp_server as it is not released.

Adjusted the run.yml by taking the current starter run.yml from llama-stack and adjusted it, more or less following what was there before.

Removed obsolete ramalama dependency, removing the circular dependency to ramalama.

It renders #148 obsolete.

It fixes containers/ramalama#2480

Summary by Sourcery

Upgrade the Ramalama remote inference provider to align with llama-stack 0.5.1 and its new remote provider infrastructure.

New Features:

  • Introduce an OpenAI-compatible Ramalama inference adapter based on the shared OpenAIMixin and RemoteInferenceProviderConfig.
  • Add starter-style run configuration with expanded providers, storage, vector store, safety, and batching support.

Bug Fixes:

  • Resolve circular dependency on the ramalama package by removing it from runtime requirements.

Enhancements:

  • Refactor the Ramalama adapter to delegate OpenAI-compatible behavior to the shared OpenAIMixin and use configuration-driven base URLs.
  • Modernize provider registration to use the new llama-stack-api datatypes and RemoteProviderSpec.
  • Update Ramalama implementation config to support validated base URLs and schema-based configuration.
  • Simplify installation by only copying the run configuration file instead of provider descriptors.
  • Refresh dependency pins to versions compatible with llama-stack 0.5.1, including fastapi, openai, opentelemetry, mcp, and milvus-lite.

Build:

  • Update core dependencies to llama-stack 0.5.1 and llama-stack-api 0.5.1 and add a setuptools upper bound to ensure compatibility.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Mar 6, 2026

Reviewer's Guide

Updates the Ramalama remote inference provider to align with llama-stack 0.5.1 by refactoring the adapter to use the shared OpenAIMixin, migrating configuration and provider spec to the new llama-stack-api types, replacing the run configuration with the starter distro template, and updating Python dependencies while removing the obsolete ramalama package and providers.d wiring.

File-Level Changes

Change Details Files
Refactor Ramalama inference adapter to use shared OpenAI-based remote inference mixin instead of custom implementation.
  • Replace custom Inference/ModelsProtocolPrivate implementation with RamalamaInferenceAdapter subclassing OpenAIMixin.
  • Inject RamalamaImplConfig via typed config instead of raw URL argument.
  • Implement get_api_key to expose optional auth_credential secret and get_base_url to supply the remote base URL from config.
  • Remove local OpenAI request/response conversion helpers and model registry plumbing now handled by OpenAIMixin.
src/ramalama_stack/ramalama_adapter.py
src/ramalama_stack/__init__.py
Migrate provider configuration and registration to llama-stack 0.5.1 conventions and llama-stack-api datatypes.
  • Redefine RamalamaImplConfig to extend RemoteInferenceProviderConfig, expose base_url with default /v1 endpoint, and annotate it with json_schema_type for discovery.
  • Update sample_run_config to emit base_url instead of url.
  • Switch provider spec to use llama_stack_api.datatypes.RemoteProviderSpec with provider_type "remote::ramalama" and fully-qualified config_class/module references.
src/ramalama_stack/config.py
src/ramalama_stack/provider.py
Replace ramalama-run.yaml with the 0.5.x starter-style distribution configuration and align storage and provider wiring.
  • Change run file versioning and distro_name to match starter distro conventions.
  • Introduce explicit providers blocks for inference (vLLM and Ramalama), vector_io, files, safety, agents, post_training, eval, datasetio, scoring, tool_runtime, and batches.
  • Define centralized storage backends and stores using kv_default/sql_default and move metadata/inference stores under storage.stores.
  • Register models, shields, tool_groups, and vector_stores using registered_resources section and configure vector_stores/safety defaults.
src/ramalama_stack/ramalama-run.yaml
Align Python dependencies with llama-stack 0.5.1, introducing llama-stack-api and related telemetry/DB libs while dropping the ramalama circular dependency.
  • Bump llama-stack to 0.5.1 and add llama-stack-api 0.5.1 with corresponding OpenTelemetry and OCI/DB-related packages.
  • Upgrade multiple transitive dependencies (e.g., aiohttp, fastapi, huggingface-hub, mcp, milvus-lite, numpy, openai, starlette) to versions compatible with llama-stack 0.5.1.
  • Remove direct ramalama dependency and all llama-stack-client-related packages, and pin setuptools<70 to satisfy upstream constraints.
requirements.txt
pyproject.toml
uv.lock
Simplify installation side effects to only install the distribution run file and stop copying provider descriptors.
  • Remove logic that copies providers.d descriptors into ~/.llama/providers.d during install.
  • Retain and slightly refactor logic that copies ramalama-run.yaml into ~/.llama/distributions/ramalama, fixing variable naming in log messages.
setup.py
Remove obsolete local provider metadata and OpenAI compatibility shims now replaced by upstream llama-stack utilities.
  • Delete local model_entries registration and OpenAI conversion utilities that are no longer used by the new OpenAIMixin-based adapter.
  • Delete the providers.d YAML descriptor for the Ramalama remote inference provider since provider registration now comes from get_provider_spec().
src/ramalama_stack/models.py
src/ramalama_stack/openai_compat.py
src/ramalama_stack/providers.d/remote/inference/ramalama.yaml

Assessment against linked issues

Issue Objective Addressed Explanation
containers/ramalama#2480 Upgrade ramalama-stack to use llama-stack version 0.5.1 (and corresponding libraries) instead of 0.2.14 so it is in sync with upstream and compatible with the latest llama-stack-client.
containers/ramalama#2480 Update the ramalama-stack adapter and configuration (including run YAML and API wiring) to the llama-stack 0.5.x provider API and OpenAI-compatible /v1 endpoint, so that the new client-server protocol works correctly.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 5 issues, and left some high level feedback:

  • In setup.py, the error message in the run step still references providers_dir, which is no longer defined after removing the providers copying logic and will raise a NameError; update the exception handler to log the actual variables used (e.g., run_yaml and target_dir).
  • RamalamaImplConfig defaults base_url to http://localhost:8080/v1 while ramalama-run.yaml wires RAMALAMA_URL directly into base_url without appending /v1; consider aligning these so users don’t accidentally pass a non-/v1 URL and break the OpenAI-compatible endpoint expectations.
  • In RamalamaInferenceAdapter.get_api_key, returning the literal string "NO KEY REQUIRED" for unauthenticated setups may not match what OpenAIMixin expects for a missing key; it’s safer to return None or an empty string and let the mixin handle the absence of credentials.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In setup.py, the error message in the run step still references `providers_dir`, which is no longer defined after removing the providers copying logic and will raise a NameError; update the exception handler to log the actual variables used (e.g., `run_yaml` and `target_dir`).
- RamalamaImplConfig defaults `base_url` to `http://localhost:8080/v1` while ramalama-run.yaml wires `RAMALAMA_URL` directly into `base_url` without appending `/v1`; consider aligning these so users don’t accidentally pass a non-/v1 URL and break the OpenAI-compatible endpoint expectations.
- In RamalamaInferenceAdapter.get_api_key, returning the literal string "NO KEY REQUIRED" for unauthenticated setups may not match what OpenAIMixin expects for a missing key; it’s safer to return None or an empty string and let the mixin handle the absence of credentials.

## Individual Comments

### Comment 1
<location path="setup.py" line_range="20-21" />
<code_context>
+            os.makedirs(target_dir, exist_ok=True)
+            shutil.copy(run_yaml, target_dir)
+            print(f"Copied {run_yaml} to {target_dir}")
         except Exception as error:
-            print(f"Failed to copy {providers_dir} to {target_dir_1}. Error: {error}")
+            print(f"Failed to copy {providers_dir} to {target_dir}. Error: {error}")
             raise

</code_context>
<issue_to_address>
**issue (bug_risk):** Exception message references `providers_dir`, which is no longer defined and will raise a `NameError` in the error path.

Since `providers_dir` is no longer defined, this error path will raise a `NameError` and hide the real copy failure. Update the message to use an existing variable that reflects what was being copied (e.g., `run_yaml`), or remove the reference entirely.
</issue_to_address>

### Comment 2
<location path="src/ramalama_stack/ramalama-run.yaml" line_range="22-31" />
<code_context>
+  - provider_id: ${env.RAMALAMA_URL:+ramalama}
</code_context>
<issue_to_address>
**issue (bug_risk):** The `ramalama` inference provider is conditional on `RAMALAMA_URL`, but the registered model always references `ramalama`.

This creates a config where the model can point to a provider that may not exist. Either make the provider unconditional (e.g., use `:=` with a default URL) or apply the same condition to the model registration so they remain consistent.
</issue_to_address>

### Comment 3
<location path="src/ramalama_stack/ramalama-run.yaml" line_range="157-163" />
<code_context>
+    provider_id: ramalama
+    model_type: llm
+  shields:
+  - shield_id: llama-guard
+    provider_id: ${env.SAFETY_MODEL:+llama-guard}
+    provider_shield_id: ${env.SAFETY_MODEL:=}
+  - shield_id: code-scanner
+    provider_id: ${env.CODE_SCANNER_MODEL:+code-scanner}
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Safety shield `provider_id`/`provider_shield_id` environment interpolation can produce an invalid or partially configured shield.

With `provider_id: ${env.SAFETY_MODEL:+llama-guard}`, if `SAFETY_MODEL` is unset, `provider_id` resolves to an empty string, and `provider_shield_id: ${env.SAFETY_MODEL:=}` also becomes empty. This creates a shield with a valid `shield_id` but no usable provider reference. If you want to fall back to the built‑in `llama-guard` when the env var is unset, either set `provider_id: llama-guard` unconditionally and use `SAFETY_MODEL` only for `provider_shield_id`, or use `:=llama-guard` so both fields default to a valid value.

```suggestion
  shields:
  - shield_id: llama-guard
    provider_id: llama-guard
    provider_shield_id: ${env.SAFETY_MODEL:=llama-guard}
  - shield_id: code-scanner
    provider_id: code-scanner
    provider_shield_id: ${env.CODE_SCANNER_MODEL:=code-scanner}
```
</issue_to_address>

### Comment 4
<location path="src/ramalama_stack/ramalama-run.yaml" line_range="177-179" />
<code_context>
 server:
   port: 8321
-external_providers_dir: ${env.EXTERNAL_PROVIDERS_DIR:=~/.llama/providers.d}
+vector_stores:
+  default_provider_id: faiss
+  default_embedding_model:
+    provider_id: sentence-transformers
+    model_id: nomic-ai/nomic-embed-text-v1.5
</code_context>
<issue_to_address>
**issue (bug_risk):** `vector_stores.default_provider_id` is set to `faiss`, but no matching `faiss` vector_io provider is configured.

Because only a conditional `milvus` `vector_io` provider is configured, this `faiss` default will not resolve unless a matching `faiss` provider is defined elsewhere. Please either point `default_provider_id` to an existing provider (e.g. `milvus`) or add a corresponding `faiss` `vector_io` provider configuration.
</issue_to_address>

### Comment 5
<location path="src/ramalama_stack/ramalama-run.yaml" line_range="1-10" />
<code_context>
+version: 2
</code_context>
<issue_to_address>
**question:** The distro is named `starter` in paths, but setup still installs under the `ramalama` distribution directory, which may be confusing or inconsistent.

This file uses `distro_name: starter` and paths under `~/.llama/distributions/starter`, while `setup.py` installs `ramalama-run.yaml` under `~/.llama/distributions/ramalama`. Please align the distro name and directory paths (all `starter` or all `ramalama`) to avoid confusion about where the distro’s state lives.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly modernizes the project's integration with the llama-stack ecosystem by upgrading to version 0.5.1. The changes streamline dependency management, remove obsolete components, and adopt a more robust and feature-rich provider infrastructure. This update ensures better compatibility, simplifies future development, and expands the capabilities of the Ramalama remote inference provider, offering a more standardized and configurable experience.

Highlights

  • Llama Stack Upgrade: Upgraded the core llama-stack dependency from version 0.2.14 to 0.5.1, aligning the project with the latest remote provider infrastructure.
  • Dependency Refactoring and Removal: Removed the ramalama dependency to resolve a circular dependency issue and updated various other package versions including milvus-lite, fastapi, openai, opentelemetry, and mcp for compatibility with llama-stack 0.5.1.
  • Simplified Provider Registration: Refactored the setup.py to no longer copy providers.d files, simplifying the installation process. Provider registration now uses the new llama-stack-api datatypes and RemoteProviderSpec.
  • OpenAI-Compatible Adapter: The RamalamaInferenceAdapter was refactored to inherit from OpenAIMixin, delegating OpenAI-compatible behavior to a shared mixin and using configuration-driven base URLs, significantly simplifying its implementation.
  • Updated Run Configuration: The ramalama-run.yaml configuration was completely revamped to align with a new 'starter' template, introducing expanded support for APIs like batches and files, and detailed configurations for storage, vector stores, and safety features.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • pyproject.toml
    • Updated llama-stack dependency from 0.2.14 to 0.5.1.
    • Added milvus-lite>=2.5.1 dependency.
    • Removed ramalama==0.10.1 dependency.
    • Added setuptools<70 constraint.
  • requirements.txt
    • Updated numerous dependency versions (e.g., aiohttp, aiosignal, fastapi, huggingface-hub, mcp, milvus-lite, numpy, openai, opentelemetry, pydantic, python-multipart, setuptools, starlette, typing-extensions, typing-inspection, urllib3).
    • Added new dependencies like annotated-doc, cffi, circuitbreaker, cryptography, oci, oracledb, opentelemetry-distro, opentelemetry-instrumentation, psycopg2-binary, pycparser, pyjwt, pyopenssl, pywin32, tornado.
    • Removed argcomplete, deprecated, ecdsa, pyaml, python-jose, rsa dependencies.
    • Adjusted dependency sources and comments.
  • setup.py
    • Removed the logic for copying providers.d to ~/.llama/providers.d.
    • Modified the error message for copying ramalama-run.yaml to correctly reference target_dir.
  • src/ramalama_stack/init.py
    • Modified get_adapter_impl to pass the full config object to RamalamaInferenceAdapter instead of just config.url.
    • Removed the await impl.initialize() call.
  • src/ramalama_stack/config.py
    • Updated RamalamaImplConfig to inherit from RemoteInferenceProviderConfig and use HttpUrl for base_url.
    • Added @json_schema_type decorator to RamalamaImplConfig.
    • Refactored sample_run_config to return a dictionary with base_url.
  • src/ramalama_stack/models.py
    • Removed the entire file, indicating a shift in how models are registered or managed.
  • src/ramalama_stack/openai_compat.py
    • Removed the entire file, suggesting that OpenAI compatibility logic is now handled by a shared mixin or upstream library.
  • src/ramalama_stack/provider.py
    • Updated get_provider_spec to use llama_stack_api.datatypes.RemoteProviderSpec instead of remote_provider_spec.
    • Added provider_type="remote::ramalama" to the RemoteProviderSpec.
  • src/ramalama_stack/providers.d/remote/inference/ramalama.yaml
    • Removed the entire file, indicating a change in how provider descriptors are handled.
  • src/ramalama_stack/ramalama-run.yaml
    • Updated version to 2 and distro_name to starter.
    • Added new API sections: batches, files.
    • Removed telemetry API section.
    • Completely restructured the providers section with new configurations for inference, vector_io, files, safety, agents, post_training, eval, datasetio, scoring, tool_runtime, and batches.
    • Introduced storage section with backends and stores configurations.
    • Added registered_resources section for models, shields, vector_dbs, datasets, scoring_fns, benchmarks, and tool_groups.
    • Added vector_stores section with detailed parameters for embedding models, file search, context prompts, annotation prompts, file ingestion, chunk retrieval, and file batching.
    • Added safety and connectors top-level sections.
    • Removed metadata_store, inference_store, models, shields, vector_dbs, datasets, scoring_fns, benchmarks, tool_groups, and external_providers_dir as top-level keys, integrating them into the new structure.
  • src/ramalama_stack/ramalama_adapter.py
    • Refactored RamalamaInferenceAdapter to inherit from OpenAIMixin.
    • Removed extensive OpenAI compatibility methods and direct AsyncOpenAI client initialization.
    • Simplified __init__ to take RamalamaImplConfig.
    • Added get_api_key and get_base_url methods, delegating to the config object.
Activity
  • The pull request description includes a detailed 'Summary by Sourcery', indicating automated analysis and summarization of the changes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades llama-stack to version 0.5.1, which is a significant update. The changes correctly adapt the codebase to the new APIs and dependency requirements of llama-stack. The refactoring of RamalamaInferenceAdapter to use OpenAIMixin is a great simplification. I found one issue in the setup.py script that needs to be addressed.

@mkristian mkristian force-pushed the upgrade-to-llama-stack-0.5.1 branch 2 times, most recently from 36e61f2 to a72bcd9 Compare March 6, 2026 19:42
Signed-off-by: Christian Meier <meier.kristian@gmail.com>
@mkristian mkristian force-pushed the upgrade-to-llama-stack-0.5.1 branch from a72bcd9 to 178ef45 Compare March 8, 2026 12:16
@mkristian mkristian changed the title Upgrade to llama stack 0.5.1 Upgrade to llama stack 0.6.0 Mar 14, 2026
@mkristian
Copy link
Copy Markdown
Author

I will have a look at the tests now . . .

Signed-off-by: Christian Meier <meier.kristian@gmail.com>
@mkristian mkristian force-pushed the upgrade-to-llama-stack-0.5.1 branch from 03e89e2 to 97d1f27 Compare March 14, 2026 15:19
@cdoern
Copy link
Copy Markdown
Collaborator

cdoern commented Mar 17, 2026

test-build.sh needs to be modified to no longer check for /home/runner/.llama/providers.d not found. providers.d is the old method of installing external providers. we now opt to install via module: in the config.yaml pointing to the installable pip pkg.

@cdoern
Copy link
Copy Markdown
Collaborator

cdoern commented Mar 17, 2026

same for the other two tests.

Signed-off-by: Christian Meier <meier.kristian@gmail.com>
Signed-off-by: Christian Meier <meier.kristian@gmail.com>
@mkristian
Copy link
Copy Markdown
Author

@cdoern could re-run the tests or do I nee to do something to trigger them ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

llama-stack is quite outdated

2 participants